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Construct Validity of Test Items Measuring Acquisition of 
Information from Line Graphs 

Abstract 

Research on the effectiveness of graphical displays for information 
acquisition and retention lacks a system for clsss/.fying graph information 
and generating test items to assess learning. The purpose of this study 
was to validate a system based on two types of information and three types 
pf informational units. Results of an analysis of variance indicated 
differences in learning predictable from the classification system; hot^ever, 
a aultitralt-multimethod matrix analysis tailed to provide evidence of 
trait validity for the system's informational constructs. In light oi: these 
results, a graph Information processing strategy was proposed in which 
subjects utilize data point infonaation. 



Construct Validity of Test Items Measuring Acquisition of 
Information from Line Graphs 

The present study deals with the acquisition and retention of quantitative 
Information from a line graph stimulus. While the acquisition of quantitative 
information from graphical displays is an important component of school 
learning » the processes involved In such situations have been studied only 
infrequently, (cf . , Washbume, 1927; Schutz, 1961). The present study is 
particularly concerned with three aspects of leamiig from a line graph stimulus 
(a) the nature of the informational unit(s) processed by subjects instructed to 
learn the information in the graph, (b) the relationship between the number of 
informational units upon which a test iten is based and accuracy of subject 
performance on that item, and (c) the relationship between study time and 
acquisition of information from the grap!*. 

In attempting to measure the acquisition of information from a line graph 
stimulus, the first question which arises concerns the nature of the informa- 
tional units pro-iessed by the subject* A logical distinction exists between 
point and slope information. In a line graph, a unit of point information 
is the value of the dependent variable associated with a specific level of 
the independent variable; a unit of slope Information is the change in value 
of the dependent variable per unit change in the independent variable associated 
with a specific, contiguous set of independent variable levels • The question 
of ianediate interest is whether this logical distinction is a meaningful 
psychological distinction; i.e., when instructed to learn the information 
in a line graph, do subjects encode point and/or slope information? If subjects 
do, in fact^ store point and slope information independently, then point and 
slope information can be viewed as informational constructs in much the same 
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way that personality constructs are viewed; thuf>, it should be possible to 
validate items measuring these informational constructs by means of multitrait- 
multimethod methodology (Campbell and Fiske, 1959). 

The second question of interest concerns the relationship between the 
number of informational units required for correct performance on items at 
recall and accuracy of subject performance on these items. Studies by Schutz 
(1961) and Washbume (1927) are tangentially related to this question, but 
because of differences in procedure, task instructions, and type of item 
presentation format, the studies do not lead directly to expectations for 
the present experiment. However, it would seea that the greater the number 
Of informational units required by an item at recall, regardless of the type 
of unit Involved, the poorer performance should be on the item, 
i The third question of Interest concerns the effects of study time on 

* information acquisition. The purpose here was to extend the research on 

} study time into the area of learning quantitative information from graphical 

i 

• materials. It was expected, as most studies have shown, that increased study 
time would result in greater acquisition. Of greatest interest, however, 
were the possible interactions of study time with the type of informational 
units and with the number of informational units which were required for 
successful performance on the test items at recall. 

Method 

Subjects. Thirty-six undergraduate education student volunteer* served 
as subjects in this experiment. 

Materials. A multiple line graph was constructed in which the average 
val'io per share of stock for each of three fictitious companies was plotted 
for each of five successive years. Each of the three lines (one per company) 
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was generated randomly, subject to the foilowinj constraints: (a) one line 
would show an increasing trend, (b) the second line would show a decreasing 
trend, and (c) the third would show random fluctuations. To generate the 
data points for the first two of these lines, the data point values were ran- 
domly sampled from the following five strings of digits: 0-5, 1-6, 2-7, 3-8, 
and 4-9. For the increasing trend line, the first digit was randomly selected 
from the 0-5 interval. The next four digits were randomly selected from the 
four succeeding digit strings. For the decreasing trend line, the first 
digit was randomly selected from the 4-9 interval. The next four were randomly 
selected from the remaining intei.-vals In sequence. The five values for the 
third line were randomly selected from the 0-9 range subject to the restriction 

\ that there would be exactly one intersection or crossover of lines in the left, 

J center, and right thirds of the graph. 

? The criterion test consisted of six subtests of eight propositions 

f each. Three subtests were based on point information; the rest on slope 

\ information. Within each information type, the three subjects were based on 

\ a single unit of information, two units arranged vertically (i.e., the price 

of stock for two companies during the same year), and two units within the 
same line (i.e., the price of a single company's stock for two separate years) 
respectively. Following the lead of Anderson (1972), Bormuth (1970), and 
■ Cronbach (1971) , basic sentence frames were formed for each item type (See 

Table I) and rules were established to generate the Items in each cell. 

Table I about here 

For example, the rules for the point items based on a single unit of 
information are listed below: 

ERLC 



1. Company naufs for the eight items were selected randomly 
with the restriction that each company name was used at 
least twice and no more than three times. 

2. The year values for the eight items were chosen randomly 
with 5:he restriction that each year value was used at 
leaf/t once and no more than twice. 

3. Th-e comparative (greater than- less than) waa assigned 
randomly to the items so that each appeared in four items 
of the subtest. 

4. Within the four items containing the 'greater than* 
comparative, the truth value was randomly assigned such 
that two propositions would be true and two would be 
false. The same procedure was used for the four 'less 
than' comparative items. 

5. For each item, the set of stock values which would satisfy 
the truth value for that item was determined and one element 
of the set was randomly selected for inclusion in the item. 

It is apparent from the above rules that items within each subtest were 
balanced for wording of compatative. (e.g., gr.ater than- less than, more 
rapidly-less rapidly, increased- decreased) and truth value. With respect 
to wording of comparatives, a number of researchers (e.g.. CUrk. 1970. 
Trabasso. 1970) have shown that positive and negative wording cf test items 
impose different information processing requirements on subjects with resulting 
differences in performance levels. These results as well as those on acquies- 
cent responding suggested that items should be balanced for comparative wording 
and truth value so that co^nparlson? c€ interest would not be differentially 
contaminated by differences in responding. 
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Analogous procedures were used for generating each of the five remaining 
itftm types. The items were then randomly ordered over the test as a whole, 
subject to constraints iiecessary for guaranteeing that the distribution of 
the various item characteristics described above would be even across the test 
as a whole. 

The graph and test items were reproduced on standard 8 1/2" x 11" sheers 
of paper and boxrnd in a seven page test booklet, A cover sheet for subject 
identification was followed by the graph, A blank sheet followed the graph 
and separated it from the three pages of test propositions to prevent the 
subjects from seeing the graph at test time. A final cover sheet completed 
the test booklet. 

Procedure » The subjects were randomly assigned in equal numbers to the 
two and eight minute study time conditions Following distribution of the 
materials, instructions were read to the subjects which (a) indicated the 
purpose of the study, (b) specified both the study time and test time limits, 
(c) informed them that the graph could not be used as a reference once the 
prescribed study time had elapsed and (d) instructed them to answer all items. 
Subjects were told they had up to 40 minutes to complete the test items. As 
in turned out, no one required more than 25 minutes to complete the test 
i:eniS. 

Results 

The number of correct responses per item type was determined for each 
subject. These data were then analyzed as a one-between, three-withln 
factorial analysis of variance. The between factor wai^ study time and the 
within factors were information type, number of Informational units, and 
wording of logical opposite pairs. Tcible II contains the means and scaadaird 
deviations for this analysis. 
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Table II about here 

All four main effects were significant while none of .:he interactions 
was significant. The mean score In th*. bright minute study condition was 
higher than the mean In the two minute condition, F « 10.90; df « 1/34, £ <.0T. 
The Qcan score on point Information items was significantly higher than the 
mean on slope Information Items F » 6.18, conservative dx « 1/34, £<.02. 
Scheffe tests on the three Infcnnatlon unit means indicated that the mean of 

V 

single unit Items was higher than the weighted means of the two unit within 
occasion and two unit within group items (g <.01)i however, the means of the 
latter two item types were not significantly different from each other (e>.05). 
The mean performance on items stated positively (greater than. Increase, more 
rapidly) was significantly higher than mean pcrformartce on Items stated nega- 
tively, F • 6.16, conservative df * 1/34, £<.02. 

To assess the relationship between performance nformation types 
and number of data points required to answer an item successfully, the six 
subtest means (information type X number of units) were analyzed as a one- 
between, one-withln analysis of variance (time X subtest). Tha two main 
effects were significant; the interaction was not. The nuraber of data poiats 
and subtest means as well as the significant comparisons by the Newman-Keuls 
procedure are contained In Table III. This analysis Indicated that only 
the mean of the point-single unit test differed significantly from the means 
of the slope-within occasion and slope-wtthin group tests. 

Table III about here 

In order to assess possible effects of response sets, the data were 
reanalyzed with study time, logical opposite pairs, and truth value as the 
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Independent variables. The only significant reaiilts were those main effects 
associated with study time and logical opposite wording. The fact that all 
interaction effects were nonsignificant seeas to rule out acquiescence as a 
possible explanation for the results obtained in the initial analysis discussed 
above. 

Tabic IV contains the multitrait-multlaethod matrix with number of 
Informational units representing the aethods, and point and slope information 
beinj. the poswible constructs. Correlation coefficients appearing in the 
talle have been corrected for attenuation. The overall pattern of coefficients 
in the matrix does not support our hypothesis that the point and slope items 
Included in this criterion test measure two distinct Informational constructs. 

Table IV about here 

Discussion 

The results of the initial analysis indicated eignlficant main effects 
for study time, wording, number of Informational units, and informational types. 
The effect of informational tvpes suggested that the point-slope dichotomy wae 
a meaningful distinction; however, the multitrait-multimethod matrix failed 
to support this distinction: performance on the various point and slope 
subtests predicted performance on subtests both within and between these two 
Informational constructs. 

An explanation for the disparate results of these two analyses may lie 
in the kind of information subjects encoded and/or retrieved under the experi- 
mental instructions and conditions of this study. It Is possible that subjects 
did not use slope information as defined In this study but instead used only 
data point information. To answer slope items, subjects recalled point 
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Information and then constructed slope information from the recalled points. 
The reasoning which follows supports this conclusion. 

Slope items are apparently more difficult than point items. If slope 
performance is a functionof a subject's recall of data points, then an increase 
in the number of data points needed for successful performance should be 
accompanied by a decrease in performance level. From Table III, it is apparent 
that this inverse relationship exist.; sul'Jects* scores tend to decrease as 
the number of data points Increases. 

Consequently, it appears that the airoint of data point information may 
be a more important factor than informational type in determining a 8i>jecf a 
performance level given the proposed information processing strategy. However, 
the present findings do not rule out the possibility that under other experi- 
mental instructions and conditions, subjects would cpxode slope information. 
If this were the case, then the present multltrait-multimethod methodology 
seems suitable for providing evidence of the encoding of slope information and 
the validity of the slope informational construct. 
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Table III 
Comparisons Among Subtest Means 

h h h \ h h 

(4 points) slope-2 units within occasion - 5.6i9 - 0,55 .278 .778 .833 1.028* 

(3 points) slope~2 units within group X5 - 5.694 ~ - .223 .723 .778 .973* 

(2 points) polnt-2 units within group - 5.917 ~ - - .500 .555 .750 

(2 points) slope— single unit ^ . SM — — ~ _ .055 .250 

(2 points) point— 2 units within occasion X3 - 6.4/2 — 

(1 point) point— single unit X^ - 6.667 — — — „ ' 



p • .01 
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Table IV 

Miiltltralt-Multimethod Matrix (Ntmber of Ir.-jrsnational Units 
as Methods, Point and Slope Information Type as Traits) 



I (single unit) II (within group) III (within occasion) 

pt. sip. pt. alp. pt. alp. 

pt. (.42) 

I sip. .07 (.47) 

II pt. .93 .80 (.48) 

sip. 1.00 .62 n.OO (.37) 

III pt. .15 n.OO .91 .88 (.48) 

sip. .15 .76 .20 .21 .31 (.57) 

*Note: Actual corrected values greater than 1. 
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